Statistical Inference for Cluster Trees

نویسندگان

  • Jisu Kim
  • Yen-Chi Chen
  • Sivaraman Balakrishnan
  • Alessandro Rinaldo
  • Larry A. Wasserman
چکیده

A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyze their properties and assess their suitability for inference. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Logic

We describe the basics of fuzzy sets and fuzzy logic. Based upon the concept of linguistic values, which describe imprecise concepts using words, the basics of fuzzy rules and fuzzy inference are introduced. In the second part we briefly explain applications of fuzzy rules for function approximation using fuzzy graphs, clustering using fuzzy algorithms, and classification under uncertainty usin...

متن کامل

A ricle A Method of Alignment Masking for Refining the Phylogenetic Signal of Multiple Sequence Alignments

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, aut...

متن کامل

Bayesian Inference for Color Image Quantization via Model-Based Clustering Trees

\Ve consider the problem of color image quantization, or clustering of the color space. vVe propose a new methodology for doing this, called model-based clustering trees. This is grounded in model-based clustering, which bases inference on finite mixture models estimated by maximum likelihood using the EM algorithm, and automatically chooses the number of clusters by Bayesian model selection, a...

متن کامل

A method of alignment masking for refining the phylogenetic signal of multiple sequence alignments.

Inaccurate inference of positional homologies in multiple sequence alignments and systematic errors introduced by alignment heuristics obfuscate phylogenetic inference. Alignment masking, the elimination of phylogenetically uninformative or misleading sites from an alignment before phylogenetic analysis, is a common practice in phylogenetic analysis. Although masking is often done manually, aut...

متن کامل

Co-clustering Spatial Data Using a Generalized Linear Mixed Model With Application to the Integrated Pest Management

version on a funder's repository at a funder's request, provided it is not made publicly available until 12 months after publication. Co-clustering has been broadly applied to many domains such as bioinformatics and text mining. However, model-based spatial co-clustering has not been studied. In this paper, we develop a co-clustering method using a generalized linear mixed model for spatial dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016